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Abstract 

Consider a large Boolean network with a feed forward structure. Given a probability distribution on the inputs, can 
one find, possibly small, collections of input nodes that determine the states of most other nodes in the network? To 
answer this question, a notion that quantifies the determinative power of an input over the states of the nodes in the 
network is needed. We argue that the mutual information (Ml) between a given subset of the inputs X = {X-\,...,X n } of 
some node /' and its associated function f/(X) quantifies the determinative power of this set of inputs over node /'. We 
compare the determinative power of a set of inputs to the sensitivity to perturbations to these inputs, and find that, 
maybe surprisingly, an input that has large sensitivity to perturbations does not necessarily have large determinative 
power. However, for unate functions, which play an important role in genetic regulatory networks, we find a direct 
relation between Ml and sensitivity to perturbations. As an application of our results, we analyze the large-scale 
regulatory network of Escherichia coli. We identify the most determinative nodes and show that a small subset of 
those reduces the overall uncertainty of the network state significantly. Furthermore, the network is found to be 
tolerant to perturbations of its inputs. 



1 Introduction 

A Boolean network (BN) is a discrete dynamical sys- 
tem, which is, for example, used to study and model a 
variety of biochemical networks such as genetic regu- 
latory networks. BNs have been introduced in the late 
1960s by Kauffman [1,2] who proposed to study random 
BNs as models of gene regulatory networks. Kauffman 
investigated their dynamical behavior and a phenomena 
called self-organization. Aside from its original purpose, 
BNs were also used to model (small-scale) genetic reg- 
ulatory networks; for example, in [3-5], it was demon- 
strated that BNs are capable of reproducing the under- 
lying biological processes (i.e., the cell cycle) well. BNs 
are also used to model large-scale networks, such as the 
Escherichia coli regulatory network [6] which is analyzed 
in Section 6. This network is, in contrast to Kauffman's 
automata and the regulatory networks considered in [3- 
5], not an autonomous system, since the gene's states are 
determined by external factors. 
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In the literature addressing the analysis of BNs, it is 
common to consider measures that quantify the effect 
of perturbations. Whether a random BN operates in the 
so called ordered or disordered regime is determined by 
whether a single perturbation, i.e., flipping the state of a 
node, is expected to spread or die out eventually. Kauff- 
man [2] argues that biological networks must operate at 
the border of the ordered and disordered regime; hence, 
they must be tolerant to perturbations to some extent. 

In contrast to measures of perturbations, determina- 
tive power in BNs has not received much attention, even 
though there are several settings where such a notion is of 
interest. For example, given a feed forward network where 
the states of the nodes are controlled by the states of nodes 
in the input layer, we might ask whether a possibly small 
set of inputs suffices to determine most states, i.e., reduces 
the uncertainty about the network's states significantly. 
This can be addressed by quantifying the determinative 
power of the input nodes. For example, in the E. coli regu- 
latory network, it turns out that a small set of metabolites 
and other inputs determine most genes that account for E. 
coli's metabolism (see Section 6). 

In this paper, we view the state of each node in the net- 
work as an independent random variable. This modeling 
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assumption applies for networks with a tree-like topol- 
ogy, e.g., a feed forward network, and is often applied 
when studying the effect of perturbations. For this setting, 
determinative power of nodes and perturbation-related 
measures are properties of single functions; hence, the 
analysis of the BN reduces to the analysis of single func- 
tions. Our main tool for the theoretical results is Fourier 
analysis of Boolean functions. Fourier analytic techniques 
were first applied to BNs by Kesseli et al. [7,8]. In [7,8], 
results related to Derrida plots and convergence of trajec- 
tories in random BNs were derived. Ribeiro et al. [9] con- 
sidered the pairwise mutual information in time series of 
random BNs, under a different setup that we use. Specif- 
ically, in [9], the functions are random; whereas here, the 
functions are deterministic, but the argument is random. 
Finally, note that part of this paper was presented at the 
2012 International Workshop on Computational Systems 
Biology [10]. 

1.1 Contributions 

Mutual information between a set of inputs to a node 
and the state of this node is a measure of the determi- 
native power of this set of inputs, as mutual informa- 
tion quantifies mutual dependence of random variables. 
In order to understand the determinative power and 
mutual dependencies in Boolean networks, we system- 
atically study the mutual information of sets of inputs 
and the state of a node. We relate mutual information 
to a measure of perturbations and prove that (maybe 
surprisingly) a set of inputs that is highly sensitive to 
perturbations might not necessarily have determinative 
power. Conversely, a set of inputs which has determina- 
tive power must be sensitive to perturbations. To prove 
those results, we show that the concentration of weight 
in the Fourier domain on certain sets of inputs charac- 
terizes a function in terms of tolerance to perturbations 
and determinative power of input nodes. Furthermore, 
we generalize a result by Xiao and Massey [11], which 
gives a necessary and sufficient condition of statistical 
independence of a set of inputs and a function's output 
in terms of the Fourier coefficients. This result can for 
instance be applied to decide for which classes of func- 
tions the algorithm presented in [12], which detects func- 
tional dependencies based on estimating mutual infor- 
mation, can succeed or fails. For unate functions, we 
show that any input and the function's output are statisti- 
cally dependent and provide a direct relation between the 
mutual information and the influence of a variable. The 
class of unate functions is especially relevant for biologi- 
cal networks, as it includes all linear threshold functions 
and all nested canalizing functions, and describes func- 
tional dependencies in gene regulatory networks well [13]. 
As an application of the theoretical results in this paper, 
we show that mutual information can be used to identify 



the determinative nodes in the large-scale model of the 
control network of E. coli's metabolism [6]. 

1.2 Outline 

The paper is organized as follows. Boolean networks and 
Fourier analysis of Boolean functions are reviewed in 
Section 2. In Section 3, the influence and average sensi- 
tivity as measures of perturbations are reviewed, and their 
relation to the Fourier spectrum is discussed. In Section 4, 
we study the mutual information of sets of inputs and the 
function's output. Section 5 is devoted to unate functions. 
Section 6 contains an analysis of the large-scale E. coli reg- 
ulatory network, using the tools and ideas developed in 
previous sections. 

2 Preliminaries 

We start with a short introduction to Boolean networks 
and Fourier analysis of Boolean functions, and introduce 
notation. 

2.1 Boolean networks 

A (synchronous) BN can be viewed as a collection of n 
nodes with memory. The state of a node i is described by 
a binary state je,-(f) € {— 1, +1} at discrete time t € N. 
Choosing the alphabet to be {—1, +1} rather than {0, 1} as 
more common in the literature on BNs will turn out to be 
advantageous later. However, both choices are equivalent. 
The state of the network at time t can be described by the 
vector x(t) = [x\(t), ...,x n (t)]e {— 1, +1}". The network 
dynamic is defined by 

*i(t+l)=/,(x(t)), (1) 

where/-: {— 1,+1} B {— 1, +1} is the Boolean func- 
tion associated with node i. At time t = 0, an initial 
state x(0) = xo is chosen. In general, not all arguments 
of a function //(x) need to be relevant. The vari- 
able Xj,j e {1, «} is said to be relevant for/i if there exists 
at least one x € {—1, +l} n , such that changing Xj to — Xj 
changes the function's value. In most of the BN models in 
biology, the functions depend on a small subset of their 
arguments only. Furthermore, not every state must have 
a function associated with it; states can also be external 
inputs to the network. 

To study the determinative power and tolerance to per- 
turbations, a probabilistic setup is needed. In our analysis, 
we assume that each state is an independent random 
variable Xi with distribution P[X, = *,•], xi G {— 1,+1}. 
The assumption of independence holds for networks with 
tree-like topology, but is not feasible for networks with 
strong local dependencies and feedback loops. However, 
in many relevant settings, a BN has a tree-like topology, 
for instance the E. coli network analyzed in Section 6. 
For a network with few local dependencies, assuming 
independence will lead to a small modeling error. Major 
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results concerning the analysis of BNs have been obtained 
under the assumptions as stated above, e.g., the annealed 
approximation [14], an important result on the spread of 
perturbations in random BNs. Several important results 
on random BNs, e.g., [14], let the network size n tend to 
infinity; hence, there are no local dependencies. 



AnD2( x ) = 1 if and only if X\ = x 2 = 1, and let pi = p 2 = 
1/2. According to (3) 



1111 

/and(x) = -- + -x\ + -x 2 + -x\x 2 . 



2.2 Notation 

We use [ n] for the set {1, 2, «}, and all sets are subsets of 
[«]. With X!sca (')> we mean t ne sum over all sets S that 
are subsets of A. Throughout this paper, we use capital let- 
ters for random variables, e.g., X, and lower case letters 
for their realizations, e.g., x. Boldface letters denote vec- 
tors, e.g., X is a random vector, and x its realization. For a 
vector x and a set A C[ «], denotes the subvector of x 
corresponding to the entries indexed by A. 

2.3 Fourier analysis of Boolean functions 

In the following, we give a short introduction to Fourier 
analysis of Boolean functions. Let X = (Xi,...,X n ) be 
a binary, product distributed random vector, i.e., the 
entries of X are independent random variables Xu i € [ n] 
with distribution PpQ € {— 1, +1}. Throughout 

this paper, probabilities P[-] and expectations E[-] are 
with respect to the distribution of X. We denote pi = 
P[X( = 1], the variance of Xi by Var its standard devi- 
ation by o~i = VVar (Xt) and finally m = E[X;]. The inner 
product of f,g : {—1, +1}" { — 1, +1} with respect to 
the distribution of X is defined as 



As a second example consider PARITY2, i.e., the XOR 
function, defined as /parity2(x) = 1 if x\ = x 2 = 1 
or if x\ = x 2 = -1, and /parity2(x) = -1 
for all other choices of x. Written as a poly- 
nomial, /parity2( x ) = xix 2 . We conclude this 
section by listing properties of the basis functions 
which are used frequently throughout this paper. 

Decomposition: Let A C [ «] and S C A, and denote S = 
A\S. Then, 



<D^(x) = %(x)$5(x). 



Orthonormality: For^l,B C [«], 



E[<D A (X)<MX)] = 



l,i£4 =B 
0, otherwise. 



Parseval's identity: For /: {—1, +1}" 



-1,+1}, 



tf.g) ^ E[/(X)£(X)] = J]P[X=x]/(x^(x) (2) 
xe(-l,l}« 

which induces the norm \\f\\ = ^ (/",/>. An orthonormal 
basis with respect to the distribution of X is 

$ s(x) =n^^ sc[„]\ 0 

is ai 

and 



e[/(x) 2 ] = i[/-|i 2 = J2hsf = i- 

sc[„] 

3 Influence and average sensitivity 

Next, we discuss measures of perturbations and their rela- 
tion to the Fourier spectrum. We start with a measure of 
the perturbation of a single input. 

Definition 1 ([16]). Define the influence of variable i on 
the function / as 



O s (x) = 1, 5 = 0. 

This basis was first proposed by Bahadur [15]. Thus, 
each Boolean function /: {— 1, +1}" {— 1,-1-1} can be 
uniquely expressed as 

/(x)= £/(S)O s (x), (3) 
SC[„] 

where f(S) = (f, <i>s) are the Fourier coefficients of f. Note 
that (3) is a representation of/ as a multilinear polynomial. 
As an example, consider the AND2 function defined as 



Ii(f) = P[/ (X) #/(Xee/)], 

where x © e; is the vector obtained from x by flipping its 
z'th entry. 

By definition, the influence of variable i is the probability 
that perturbing, i.e., flipping, input i changes the func- 
tion's output. Influence can be viewed as the capability of 
input i to change the output of / In BNs, usually, the sum 
of all influences, i.e., the average sensitivity is studied. 



Heckel etal. EURASIP Journal on Bioinformatics and Systems Biology 2013, 2013:6 
http://bsb.eurasipjournals.eom/content/2013/1/6 



Page 4 of 13 



Definition 2. The average sensitivity of / to the variables 
in the set A is defined as 



ieA 

The average sensitivity of / is defined as as(f) = 

/(l n}(f). 

Ia(/) captures whether flipping an input chosen uni- 
formly at random from A affects the function's output. 
Most commonly, all inputs are taken into account, i.e., 
the average sensitivity as(f) is studied. As an example, 
as(/r>ARiTY2) = 2 and as(/XND2) = 1» hence, PARITY2 is 
more sensitive to single perturbations than AND2. Influ- 
ence and average sensitivity have the following convenient 
expressions in terms of Fourier coefficients. 

Proposition 1 (Lemma 4.1 of [17]). For any Boolean 
function f 



W) = 



f(S) 2 . 



(4) 



Proposition 2. For any Boolean function f 



Proposition 2 follows directly from Proposition 1 and 
the definition of From (5), we see that as(/") is large 

if the Fourier weight is concentrated on the coefficients 

of high degree d = \S\, i.e., if J2 S , \ S \>dh s f is lar g e (i- e - 
close to one). For this case, Parseval's identity implies that 
the f(S) 2 with \S\ < d must be small. Let's see an exam- 
ple: Suppose pi = p2 = P3 = 1/2 and consider the 
AND3 function, i.e., _/and3(#i> #2>*3) = 1 if and only if 
1-/AND3 is tolerant to perturbations since 
as(/AND3) = 0.75, and as Figure 1 shows, its spectrum is 
concentrated on the coefficients of low degree. In contrast 
for/pARiTY3(*i>*2>*3) = *i*2*3, as(/p A RiTY) = 3. Hence, 
PARITY3 is maximally sensitive to perturbations. Figure 1 
shows that its spectrum is maximally concentrated on the 
coefficient of highest degree. 

According to (5) as(/) is small only if the Fourier weight 
is concentrated on the coefficients of low degree. This is 
the case either if / is strongly biased (i.e., if /(x) = a, 
for most inputs x, where a e {—1, 1} is a constant) or 
if / depends on few variables only. This is in accordance 
with the results of Kauffman [1]; he found that a random 
BN operates in the ordered regime if the functions in the 
network depend on average on few variables. 

We will state our result for measures of single pertur- 
bations. However, these results also apply to other noise 
models, specifically to the noise sensitivity of / That is, 
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Figure 1 The Fourier spectrum of AND3 and PARITY3. 



because the noise sensitivity of / is small if / is tolerant 
to single perturbations. The noise sensitivity of a Boolean 
function is defined as the probability that the function's 
output changes if each input is flipped independently with 
probability e. For uniformly distributed X, e as(/") is an 
upper bound for the noise sensitivity; for small values of e, 
e &s(f ) approximates the noise sensitivity well. For the Xt 
being equally but possibly nonuniformly distributed and 
a slightly different noise model, it was found in [18] that 
e as(/) still upper bounds the noise sensitivity. This result 
was generalized to product distributed X in [19]. 

4 Mutual information and uncertainty 

In this section, we study the determinative power of a sub- 
set of variables X^, where X^ consists of the entries of 
X corresponding to the indices in the set A C [n], over 
the function's output /(X). As a measure of determina- 
tive power, we take the mutual information MI (/XX); X^) 
between /(X) and X^, since MI(/'(X);X^) quantifies the 
statistical dependence between the random variable X^ 
and /(X). Hence, this section is devoted to the study of 

w(f(xy,XA). 

Before giving a formal definition of mutual informa- 
tion, let us start with an example. Consider the PAR- 
IT Y2 function and let its inputs X\,X2 be uniformly 
distributed. Intuitively, if X\ has determinative power, 
knowledge about X\ should provide us with informa- 
tion about /parity2(X). Suppose we know the value 
of X\, say X\ = 1. Since /parity2( x ) = #i#2> we 
have with P[X 2 = 1] = 1/2 that P[/parity2(X) = l] = 
P[/parity2 = = l]. Hence, knowledge ofXi does not 
help to predict the value of /parity2(X). Therefore, X\ has 
no determinative power over /parity2(X). We indeed have 
MI(/parity 2 (X);X 1 ) = 0. 

We next define mutual information. Mutual information 
is the reduction of uncertainty of a random variable Y due 
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to the knowledge of X; therefore, we need to define a mea- 
sure of uncertainty first, which is entropy. As a reference 
for the following definitions, see [20]. 

Definition 3. The entropy H(X) of a discrete random 
variable X with alphabet X is defined as 

H(X) = - p i x = *] lo g2 p i x = M ■ 



Definition 4. The conditional entropy H(Y\X) of a pair 
of discrete and jointly distributed random variables (Y,X) 
is defined as 



H(Y\X) = P[X = x] H(Y\X = x). 



Definition 5. The mutual information Ml(Y;X) is the 
reduction of uncertainty of the random variable Y due to 
the knowledge of X 

Ml(Y;X) = H(Y) - H(Y\X). 

For a binary random variable X with alphabet X = 
{x\,X2} and p = P[X = x\], we have H(X) = hip), where 
hip) is the binary entropy function, defined as 



hip) = -p\o% 2 p - (1 -p)\og 2 (l -p). 



(6) 



The properties of mutual information are what we intu- 
itively expect from a measure of determinative power: If 
knowledge of Xi reduces the uncertainty of /(X), then Xi 
determines the state of /(X) to some extent, because then, 
knowledge about the state of Xi helps in predicting/(X). 
Furthermore, we require from a measure of determinative 
power that not all variables can have large determina- 
tive power simultaneously. This is guaranteed for mutual 
information as 



^MI(/(X);X ( ) < MI(AX);X) < 1, 



(7) 



which follows from the chain rule of mutual information 
(as a reference, see [20]) and independence of the X{, i e 
[«]. Hence, if MI(/\X);X) is large, i.e., close to 1, we can 
be sure that X; has determinative power over /(X) since 
(7) implies that MI(/(X);X ; ) for ^ i must be small then. 

4.1 Mutual information and the Fourier spectrum 

In order to study determinative power, its relation to mea- 
sures of perturbations, and statistical dependencies, we 
start by characterizing the mutual information in terms of 
Fourier coefficients. Our results are based on the follow- 
ing novel characterization of entropy in terms of Fourier 
coefficients. 



Theorem 1. Let fbe a Boolean function, let X be product 
distributed, and let Xa = {X;: i e A} be a fixed set of 
arguments, where A C[«]. Then, 



h {\ ^l+E/(S)4> s (X A )jj 



H(f(X)\X A )=E 
where hi-) is the binary entropy function as defined in (). 



Proof. See Appendix 2. For the special case of uniformly 
distributed X, a proof appears in [21], in the context of 
designing S-boxes. □ 

Using the definition of mutual information, an 
immediate corollary of Theorem 1 is the following: 

Corollary 1. Let f be a Boolean function, X be product 
distributed, andX A = {Xf. i e A}. Then, 



MI(f(X);X A ) = /z(l/2(l +/(0)) 



- E 




E/(5)O s (X A ) 

S£A ) 



(8) 



Theorem 1 (and Corollary 1) shows that the condi- 
tional entropy HifiX)\XA) and the mutual information 
MI(/\X);Xa) are functions of the coefficients \f(S): S C 
A] only. This already hints at a fundamental difference 
to the average sensitivity, since the average sensitivity 
depends on the coefficients {fiS): \SC\A\ > 0}, according 
to (5). 

We next discuss MI(f(X);X/) based on (8). First, note 
that MlifiX);Xi) has previously been studied under 
the notion information gain as a measure of 'goodness' 
for split variables in greedy tree learners [22] and also 
under the notion of informativeness to quantify voting 
power [23]. According to (8), the mutual information 
MI(f(X);X() just depends on /({«}), /(0), and pi. In con- 
trast, the influence Iiif) is a function of the coefficients 
{fiS): S e[n],i e S}, according to (4). In Figure 2, we 
depict MlifiX);Xi) for p t = 0.3 as a function of fi{i}) and 
/(0). 

It can be seen that MI(/\X);X) = 0, i.e.,/(X) andX,- are 
statistically independent if and only if /({/}) = 0. That can 
be formalized as follows: MI(/(X);X) is convex in /({/}). 
This can be proven by taking the second derivative of (8) 
and observing that it is larger than zero for all pairs of val- 
ues (f(0), /({«})) for which MI(/XX);Xi) is defined. Next, 
from (8), we see that MI(f (X);X) = 0 if/({«}) = 0; hence, 
it follows that MI(/(X);X) = 0 if and only if /({/}) = 0, 
which proves the following result: 
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Figure 2 MI(/(X);X,) as a function of/({i)) and/(0) for p t = 0.3. 



Corollary 2. Let fbe a Boolean function, and X be prod- 
uct distributed. Xi and fiX) are statistically independent 
if and only iff({i}) = 0. 

Corollary 2 also follows immediately from a more gen- 
eral result, namely Theorem 5, which is presented later. 
Recall that for PARITY2, MI(f PA RiTY2(X);Xi) = 0 and 
/({l}) = 0; hence, Corollary 2 comes at no surprise. 

From Figure 2, it can be seen that the larger |/X{z'})l> 
the larger MI(/\X);X,) becomes. Formally, it follows 
from the convexity of MI(/(X);X,) and Corollary 2 that 
Ml(f(X);Xt) is increasing in [/({/})|. Hence, Xi has large 
determinative power, i.e., MI/(X);X ( ) is large, if and only 
if |/({0) I is large (i.e., close to one). f({i}) | is trivially max- 
imized for the dictatorship function, i.e., for /(x) = xi, 
or its negation, i.e.,/(x) = —xi. The output /(X) of the 
dictatorship function is fully determined by xi. 

Next, let us consider the (trivial) case where A = [ n] and 
hence X4 = X. Then, MI/(X);X) = /z(l/2(l +/(0)). 
It follows that MI(/XX);X) is maximized for/(0) = 0, 
i.e, P[/(X) = l] = 1/2, i.e., if the variance of /(X) is 

1. In general, the closer to zero /(0) is, the larger the 
mutual information between a function's output and all 
its inputs becomes. Let us finally relate the conditional 
entropy H(f(X)\XA) to the concentration of the Fourier 
weight on the coefficients [S: S C A}, A C [ «]. 

Theorem 2. Let fbe a Boolean function, let X be product 
distributed, and let Xa = {X;: i e A} be a fixed set of 
arguments, where A c[«]. Then, 

(\ KHf 
S£A / S£A 



Proof. See Appendix 3. □ 



Theorem 2 shows that H(f(X)\X A ) can be approx- 
imated with 1 — X!sca/(^) 2 - I* further shows that 
H (f '(X)|Xyt) is small if the Fourier weight is concentrated 
on the variables in the set A, i.e., if Xisoi/^) 2 i s c l° se 
to one. In contrast, as mentioned previously, for Ia(T)> *t 
is relevant whether the Fourier weight is concentrated on 
the coefficients with high degree. 

4.2 Relation to measures of perturbation 

Mutual information and average sensitivity are related as 
follows. 

Theorem 3. For any Boolean function f, for any product 
distributed X, 



lA<f) > 




(Ml(f(X);X A )-*(Var(AX)))) 



(9) 

with 

*(*) = (*) 1/ln(4) -x. (10) 



Proof. See Appendix 4. □ 

Note that the term * (Var (/\X))) is close to zero. 
Specifically, for any /(X) we have 0 < * (Var (f (X))) < 
0.12, and for settings of interest, * (Var(/\X))) is very 
close to zero, as explained in more detail in the follow- 
ing. Theorem 3 shows that if MI(/\X); X^) if large (i.e., 
close to one), / must be sensitive to perturbations of the 
entries of X4. Moreover, if Ia(/) is small (i.e., if/ is tolerant 
to perturbations of the entries of X^), then MI(/(X); X^) 
must be small (i.e., the entries of X^ do not have deter- 
minative power). For the case that A = [«], Theorem 3 
states that the average sensitivity as/) is lower-bounded 
by MI(/\X); X) minus some small term. 

We next discuss the special case thatA = {«'}. Theorem 3 
evaluated for A = {/} yields a lower bound on the influ- 
ence of a variable in terms of the mutual information of 
that variable, namely 

W) > \ (MI (f (X);Xi) - * (Var (f(X)))) . (11) 

Again, * (Var (/\X))) is close to zero for settings of 
interest, as the following argument explains. Equation 
(11) will not be evaluated for small Var (f(X)); since then, 
/(X) is close to a constant function (i.e., close to /(X) = 1 
or/(X) = -1), and /;(/) and MI(f(X);JQ) must both be 
small (i.e., close to zero) anyway. Hence, (11) is of interest 
when Var (f(X)) is large, i.e., close to 1; for this case, the 
term * (Var(/\X))) is small (e.g., for Var(/(X)) > 0.8, 
* (Var (f(X)) < 0.05). Observe that, according to (11), if 
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Ml(f(X);Xi) is large, then /,(/") is also large. That proves 
the intuitive idea that if an input determines /(X) to some 
extent, this input must be sensitive to perturbations. Con- 
versely, as mentioned previously, an input i can have large 
influence and still MI(/\X); X t ) = 0. E.g., for the PARITY2 
function, we have Itf) = 1 and Ml(f(X);Xd = 0. 

Interestingly, the influence also has an information the- 
oretic interpretation. The following theorem generalizes 
Theorem 1 in [23]. 

Theorem 4. For any Boolean function f, for any product 
distributed X, 



W) = 



H(f(X)\X lnm ) 
H{Xd 



perturbations, then small sets of inputs and the function's 
output are statistically dependent. 

Theorem 5 also has an important implication for algo- 
rithms that detect functional dependencies in a BN based 
on estimating the mutual information from observations 
of the network's states, such as the algorithm presented in 
[12]. Theorem 5 characterizes the classes of functions for 
which such an algorithm may succeed and for which it will 
fail. Moreover, Theorem 5 shows that in a Boolean model 
of a genetic regulatory network, a functional dependency 
between a gene and a regulator cannot be detected based 
on statistical dependence of a regulator X t and a gene's 
state fjiX), unless the regulatory functions are restricted to 
those for which [/"({/}) | > 0 holds for each relevant input i. 



Proof. See Appendix 5. For uniformly distributed X, a 
proof appears in [23]. □ 

Theorem 4 shows that the influence of a variable is a 
measure for the uncertainty of the function's output that 
remains if all variables except variable i are set. 

4.3 Statistical independence of inputs to a Boolean 
function 

Next, we characterize statistical independence of /(X) 
and a set of its arguments X^ in terms of Fourier coeffi- 
cients. This result generalizes a theorem derived by Xiao 
and Massey [11] from uniform to product distributed X. 

Theorem 5. Let A C.[n] be fixed, f be a Boolean function, 
and X be product distributed. Then, f(X) and the inputs 
Xa = {Xt : i e A} are statistically independent if and only 
if 

f(S) =0 for allS QA\</>. 



Proof. See Appendix 6. For uniformly distributed X, i.e., 
PpQ = 1] = 1/2 for all i e [«], Theorem 5 has been 
derived by Xiao and Massey [11]. Note that the proof pro- 
vided here is also conceptually different from the proof 
for the uniform case in [11], as it does not rely on the 
Xiao-Massey lemma. □ 

Theorem 5 shows that a function and small sets of its 
inputs are statistically independent if the spectrum is con- 
centrated on the coefficients of high degree d = \S\. 
The most prominent example is the parity function of n 
variables, i.e.,/pARiTYN(x) = X\Xi-.x n : For uniformly dis- 
tributed X, each subset of « — 1 or fewer arguments and 
/parityn(X) are statistically independent. Conversely, if a 
function is concentrated on the coefficients of low degree 
d = \S\, which is the case for functions that are tolerant to 



5 Unate functions 

In this section, we discuss unate, i.e., locally monotone 
functions. 

Definition 6. A Boolean function / is said to be unate in xi 
if for each x = (x\, ...,x„) € {—1, +1}" and for some fixed 
ai € {-!,+!}, f{xi,...,xi = -ai,...,x n ) < f(xi,...,Xi = 
a i, ...,x n ) holds. / is said to be unate if / is unate in each 
variable i e [«]. 

Each linear threshold function and nested canalizing 
function is unate. Moreover, most, if not all, regulatory 
interactions in a biological network are considered to be 
unate. That can be deduced from [13,24], and the basic 
argument is the following: If an element acts either as 
a repressor or an activator for some gene, but never as 
both (which is a reasonable assumption for regulatory 
interactions[13,24]), then the function determining the 
gene's state is unate by definition. For unate functions, the 
following property holds: 

Propositions. Letf : {— 1,+1}" {—!,+!} be unate. 
Then, 

f({i})=a i a i I i (f),Vie[n], (12) 
where a ; € {—1, +1} is the parameter in Definition 6. 

Proof. Goes along the same lines as the proof for mono- 
tone functions in Lemma 4.5 of [17]. □ 

Note that conversely, if (12) holds for each Xt, i e [n],f 
is not necessarily unate. Inserting (12) into (8) yields 

MI(AX);^)=/*Q(1 +/(0))) 



-E 



1 +/(0) + amhif) 



Xi - jii 



(13) 

where the expectation in (13) is over Xt. Based on (13), 
the discussion from Section 4.1 on Ml(f;Xt) applies by 
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using/({/}) and aioJiif) synonymously. Hence, for unate 
functions, the mutual information Ml(f;Xi) is increasing 
in the influence \h(f)\. Moreover, if / is unate, and %i is 
a relevant variable, i.e., a variable on which the functions 
actually depend on, then [/"({/}) | > 0. From this fact and 
the same arguments as given in Section 4.1 follows: 

Theorem 6. Let f : {-1,+1}" -> {-1,+1} be unate. If 
and only ifxt is a relevant variable, then MI(f(X); Xf) 7^ 0. 

In a Boolean model of a biological regulatory network, 
this implies that if the functions in the network are unate, 
then a regulator and the target gene must be statistically 
dependent. 

6 E. coli regulatory network 

In [6], the authors presented a complex computational 
model of the E. coli transcriptional regulatory network 
that controls central parts of the E. coli metabolism. 
The network consists of 798 nodes and 1160 edges. Of 
the nodes, 636 represent genes and of the remaining 
162 nodes, most (103) are external metabolites. The rest 
are stimuli, and others are state variables such as inter- 
nal metabolites. The network has a layered feed-forward 
structure, i.e., no feedback loops exist. The elements in the 
first layer can be viewed as the inputs of the system, and 
the elements in the following seven layers are interacting 
genes that represent the internal state of the system. Our 
experiments revealed that all functions are unate; there- 
fore, the properties derived in Section 5 apply. Note that 
all functions being unate is a special property of the net- 
work, since if functions are chosen uniformly at random, 
it is unlikely to sample a unate function, in particular if the 
number of inputs n is large. 

6.1 Determinative nodes in the E. coli network 

We first identify the input nodes that have large determi- 
native power (we will define what that means in a network 
setting shortly) and then show that a small number thereof 
reduces the uncertainty of the network's state significantly. 
Specifically, we show that on average, the entropy of the 
node's states conditioned on a small set of determinative 
input nodes, is small. 

To put this result into perspective, we perform the same 
experiment for random networks with the same and dif- 
ferent topology as the E. coli network. We denote by X = 
{Xi, -,X„}, n = 145 the set of inputs of the feed forward 
network and assume that the X; are independent and uni- 
formly distributed. The remaining variables are denoted 
by Y = {Yi,..., Y m },m = 653 and are a function of the 
inputs and the network's states, i.e., Y, = f!(X,Y). For 
our analysis, the distributions of the random variables 
Yi, Y m need to be computed, since some of those vari- 
ables are arguments to other functions. This can be cir- 
cumvented by defining a collapsed network, i.e., a network 



where each state of a node is given as a function of the 
input nodes only, i.e., Yi = fi(X). The collapsed network 
is obtained by consecutively inserting functions into each 
other, until each function only depends on states of nodes 
in the input layer, i.e., on X. The collapsed network reveals 
the dependencies of each node on the input variables. 
Interestingly, in the collapsed network, it is seen that the 
variables chol_xt>0, salicylate, 2ddglcn_xt>0, mnnh>0, 
altrh>0, and his-l_xt>0 (here, and in the following, we 
adopt the names from the original dataset), which appear 
to be inputs when considering the original E. coli network, 
turn out to be not. Consider, for example, the node sali- 
cylate. The only node dependent on salicylate is mara = 
(( NOT area OR NOT fnr) OR oxyr OR salicylate). How- 
ever, area = (fnr AND NOT oxyr), and it is easily seen that 
mara simplifies to mara = 1. 

Next, we identify the determinative nodes. As argued in 
Section 4, Ml(fi(X);Xj) is a measure of the determinative 
power of Xj over Yi =_/J(X). This motivates the definition 
of the determinative power of input Xj over the states in 
the network as 



m 

D(j) ^£)MI(f,(X);Xy). 

i=l 

Note that a small value of Dij) implies that Xj alone does 
not have large determinative power over the network's 
states, but Xj may have large determinative power over 
the network states in conjunction with other variables. 
In principle Ym=i MI(^(X);JQ,X^) can be large for some 
j, k € [«], even though D(j) and D(k) are equal to zero. 
This is, however, not possible in the E. coli network since 
the functions are unate. Specifically, Ml{fi(X);Xf,X^) ^ 0 
implies that Xj or are relevant variables, and according 
to Theorem 6, Ml(fi(X);Xj) # 0 or Ml(fi(X); X k ) # 0. We 
computed D(f) for each input variable and found that D(j) 
is large just for some inputs, such as o2_xt (37 bit), leu- 
l_xt (20.9 bit), glc-d_xt (19.3 bit), and glcn_xt>0 (17 bit), 
but is small for most other variables. Partly, this can be 
explained by the out-degree (i.e., the number of outgoing 
edges of a node) distribution of the input nodes. How- 
ever, having a large out-degree does not necessarily result 
in large values of D(j). In fact, in the E. coli network, glc- 
d_xt, glcn_xt>0, and o2_xt have 99, 93, and 73 outgoing 
edges, respectively. On the other hand, D(glc-d_xt) = 19.3 
bit and D(glcn_xt>0) = 17 bit, whereas D(o2_xt) = 37 bit. 

Denote r as a permutation on [ «], such that D{X x tr,) > 
D(X T (2)) > ... > D(X x ( n )), i.e., r orders the input nodes 
in descending order in their determinative power. We 
next consider H(Y\X x (i), ...,X r ^) as a function of / to 
see whether knowledge of a small set of input nodes 
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reduces the entropy of the overall network state signif- 
icantly. H(Y\X T (i), ...,X X ([)) has an interesting interpre- 
tation which arises as a consequence of the so called 
asymptotic equipartition property [20] (as discussed in 
greater detail in [25]): Consider a sequence yi,...,y* of k 
samples of the random variable Y. For e > 0 and k suf- 
ficiently large, there exists a set of typical sequences 
yii —iYIo such that 

|^ (k) | < 2 HH(Y)+€) 

and 

p[y e A^] > 1 - e, 

where \Al \ denotes the cardinality of the set A^\ This 
shows that the sequences obtained as samples of Y are 
likely to fall in a set of size determined by the uncer- 
tainty of Y. Since the output layer consists of 653 nodes, 
the network's state space has maximal size 2 653 . Since Y 
is a function of X, H(Y) < H(X) = 145bit, where for 
the last equality, we assume uniformly distributed inputs. 
Thus, without knowing the state of any input variable, 
the network's state is likely to be in a set of size roughly 
2 145 . Given the knowledge about the states X t (i), ■■■>^z(l)> 
the state of the network is likely to be in a set of size 
roughly 2 w ' Y ' Xr < 1 >' ' , ^ r <"'. For a large network, however, 
H(Y\X Z (i), X T (i)) is expensive to compute as by defini- 
tion: 

H(Y\X a ) = J2v[Xa=xa] 

M 

x ^P[Y=y|X A =x^]log 2 P[Y=y|X^=x A ]. 
y 

(14) 

Hence, the number of terms in the sum is exponential in 
« and \A\. An estimate of (14) can be obtained by sampling 
uniformly at random over x^ and y. Instead, we will con- 
sider the following upper bound which is computationally 
inexpensive to compute: 

H(Y\X r(1) ,...,X r( i ) )<A([) 
with 

m 

A(i)±J2 H W x *m>-> x ^- 

i=l 

The bound above follows from the chain rule for entropy 
[20]. H(Yt\X T (i), ...,X r( /)) is computationally inexpensive 




Figure 3 The upper boundA(Z) onH(Y\X T (i), ..,,X T (j)) as a 
function of /for the E. coli network and random networks. 



to compute, since Y[ depends on few variables only (in 
the E. coli network, on < 8). For the E. coli network, 
A(l) is depicted in Figure 3 as a function of /. Figure 3 
shows that knowledge of the states of the most determi- 
native nodes reduces the uncertainty about the network's 
states significantly. In fact, the upper bound A(l) is loose; 
hence, we even expect H(Y\X T (r), ...,X Z ^) to lie signif- 
icantly below A(l). Also, note that when A(l) is small, 
H(Yi\X T (i), ...,X r (/)) must be small on average; hence, 
P[yi = l|X T (i), ...,X r (/)] is close to one or zero on average. 

To put A{1) for the E. coli network in Figure 3 into per- 
spective, we compute A(l) for random networks. First, 
we took the E. coli network and exchanged each func- 
tion with one chosen uniformly at random from the set 
of all Boolean functions of corresponding degree. We also 
exchanged each function with one chosen uniformly at 
random from all unate functions. We performed the same 
experiment for the original E. coli network for 25 choices 
of random and random unate functions, respectively. The 
mean of A(l), along with one standard deviation from the 
mean (dashed lines), is plotted in Figure 3 for random 
and random unate functions. It is seen that fewer inputs 
determine the output of the original E. coli network, com- 
pared to its random counterparts. For example, to obtain 
A(l) = 50, about twice as many inputs need to be known 
if the functions in the E. coli network are exchanged for 
functions chosen uniformly at random. 

Next, we generated at random feed forward networks 
with m = 653 outputs and n = 145 inputs, each with 
out-degree 8, i.e., the average out-degree of the inputs in 
the collapsed E. coli network. Again, we computed A{1) 
for 25 choices of random and random unate functions, 
respectively. The mean and one standard deviation from 
the mean are depicted in Figure 3. The results show that, 
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Figure 4 Average sensitivity in the £. co/i network. Pairs of values 
(as(f), Pr[f(X) = 1]) of each function in the £ coli network for 
different in-degrees K and uniformly distributed X. Moreover, a lower 
bound on the average sensitivity as(f), i.e., Poincare's inequality, is 
plotted. 



as expected, for a random feed forward network, there 
seems to be no small set of inputs that determines the 
outputs. 

6.2 Tolerance to perturbations 

Finally, we discuss the average sensitivity of individual 
functions in the E. coli network. In Section 3, we found 
that the average sensitivity is small if the Fourier spec- 
trum is concentrated on the coefficients of low degree. 
This appears to be the case for functions that are highly 
biased and for functions that depend on few variables only. 
Figure 4 shows pairs of values (asf/), Pr[/(X) = 1] ) for 
each function in the E. coli network, again assuming that 
the Xi are independent and uniformly distributed. We can 
see from Figure 4 that the average sensitivity of all func- 
tions is close to the lower bound on the average sensitivity. 
Note that the functions with high in-degree K (i.e., num- 
ber of relevant input variables), which could have average 
sensitivity up to K, also have small average sensitivity, 
because those functions are highly biased. We, therefore, 
can conclude that the functions have small average sen- 
sitivity either because they depend on few variables only 
or because they are highly biased. For other input distri- 
butions, i.e., other values of p = P[X; = 1] , Vi e [«], we 
obtained the same results. 

7 Conclusion 

In a Boolean network, tolerance to perturbations, deter- 
minative power, and statistical dependencies between 
nodes are properties of single functions in a probabilis- 
tic setting. Hence, we analyzed single functions with 
product distributed argument. We used Fourier analy- 
sis of Boolean functions to study the mutual information 
between a function /(X) and a set of its inputs X/i, as 
a measure of determinative power of X^ over /(X). We 
related the mutual information to the Fourier spectrum 



and proved that the mutual information lower bounds the 
influence, a measure of perturbation. We also gave neces- 
sary and sufficient conditions for statistical independence 
of /(X) and Xa- For the class of unate functions, which 
are particularly interesting for biological networks, we 
found that mutual information and influence are directly 
related (not just via an inequality). We also found that 
Ml(f(X.);Xt) > 0 for each relevant input i, which, as an 
application, implies that in a unate regulatory network, 
a gene and its regulator must be statistically dependent. 
As an application of our results, we analyzed the large- 
scale regulatory network of E. coli. We identified the most 
determinative input nodes in the network and found that 
it is sufficient to know only a small subset of those in order 
to reduce the uncertainty of the overall network state sig- 
nificantly. This, in turn, reduces the size of the state space 
in which the network is likely to be found significantly. 

A possible direction for future work is to provide an 
analysis similar to that of the E. coli regulatory network 
for other Boolean models of biological networks, and see 
if similar conclusions as in Section 6 can be reached. 
One of the main assumptions in our work is the inde- 
pendence among the input variables of the network. It 
would be interesting to provide methods that can be used 
beyond this setup. However, deriving such results is chal- 
lenging because for dependent inputs, the basis functions 
<I>s(x) do not factorize as in (3), and many results cited 
and derived in this paper make use of this particular 
form of the basis functions. In this paper, we focused 
on generic properties of information-processing networks 
that may help identify possible principles that underly bio- 
logical networks. Assessing our findings from a biological 
perspective would be an interesting next step. 

Appendices 

Appendix 1 
Lemma 1 

For the proof of Theorems 1 and 5, we will need the 
following lemma: 

Lemma 1. Let f be a Boolean function, let X be prod- 
uct distributed, and let A C [ n] and some fixed xa e 
{— 1, +1}'" 4 ' be given. Then, 



E[/(X)|X^ = x A ] = ^/(5)<D s (xa)- 

SCA 



(15) 



Proof. Inserting the Fourier expansion of /(X) given by (3) 
in the left-hand side of (15) and utilizing the linearity of 
conditional expectation yields 



E[f(X)\X A =x A ]. 



^/(S)E[<D 5 (X)|X A 

Sc[n] 
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For A, 

E[cD s (X)|X A =x A ]= <D 5 (x A ). 
Conversely, for S A, 

E[O s (X)|X A =x A ] = 0. 

To see this, assume without loss of generality that S = A U 
{/} and j $ A. Using the decomposition property of the 
basis function as given in Section 2.3, 



E[O s (X)|X A = x A ] = E 



Y[t> {i] (X)\X A =x A 

LieS 

= f]E[<D w (X)|X A =^] 

ieS 

which is equal to zero as 

E[ 4> w (X)|Xa = x A ] = E[ «D W (X)] = 0. 



Appendix 2 
Proof of Theorem 1 

First, 



P[/(X) = 1|X A = x A ] = - (1 + E[/(X)|X A = x A ] ) 



□ 



= \^ + J2f(S)^s(x A )\ 



(16) 



where (16) follows from an application of Lemma 1. By 
definition of the conditional entropy, 



H(f(X)\X A ) = J2 P ^ = XA]H(f(X)\X A = x A ) 
x A e(-l,l|W 

= J2 m A =x A ]h(p[f(x)=i\x A =x A ]) 

= P[X A =x A ]h(q(x A )) (17) 

x A €{-l,l}W 

= E[h(q(X A ))], (18) 

where h(-) is the binary entropy function as defined in (6). 
To obtain (17), we used (16). The expectation in (17) is 
with respect to the distribution of X^. Inserting q(X A ) as 
given by (16) in (18) concludes the proof. 



Appendix 3 
Proof of Theorem 2 

First, note that with q(-) as defined in (16), we have 



E[4q(X A )(l - q(X A ))] =E 



- ^/(5)* s (X A )j 
=£ J2f(S)f(UM<l>s(XA)®u(XA)] 

SQAUQA 

= 1 - J2f(S) 2 , (19) 



where (19) follows from the orthogonality of the basis 
functions. 

We start with proving the lower bound in Theorem 2. 
Applying the lower bound on the binary entropy function 
Hp) > 4/>(l — p), given in Theorem 1.2 of [26], on (18) 
yields 

H(f(X)\X A ) = E[h(q(X A ))] > E[4q(X A )(l-q(X A ))] , 

and the lower bound in Theorem 2 follows using (19). 

Next, we prove the upper bound in Theorem 2. Applying 
the upper bound on the binary entropy function h(p) < 
(p(\ -p)) l/ln(4) , given in Theorem 1.2 of [26], on (18) 
yields 

H(f(X)\X A )=E[h(q(X A )] 

< E[ (4q(X A )(l - q(X A )) ) 1 / ln(4) ] . (20) 



The term Y in (20) is a random variable, and the 
function (y)V ln ( 4 ) [ s concave in Y. An application of 
Jensen's inequality (see e.g. [20]) yields E[ (7) 1/ln(4) ] < 
(E[ Y] )Vln(4). hence| the r ight-hand side of (20) can be 
lower as 



H(f(X)\X A ) < (E[4^(X A )(1 - q(X A ))]) 



l/ln(4) 



(21) 



Finally, the upper bound in Theorem 2 follows from 
combining (21) and (19). 

Appendix 4 
Proof of Theorem 3 

According to Proposition 2, 



SC[n] ZeSnA i 

> T f(Sf\S n^|min ( — 

\ ' I SCA\0 



> mm 

ieA 



(22) 



Heckel etal. EURASIP Journal on Bioinformatics and Systems Biology 2013, 2013:6 
http://bsb.eurasipjournals.eom/content/2013/1/6 



Page 12of 13 



Next, we rewrite the lower bound on H(f(X)\X A ) given 
by Theorem 2 as 

J2 f(Sf > 1 -fm 2 - H(f(X)\X A ). (23) 

SCA\0 

By adding H(f(X)) - H(f(X)) on the right-hand side of 
(23) and using the definition of mutual information, (23) 
becomes 

J^fiSf > MI(f (X); X A ) -H(f(X)) + 1 -/(0) 2 . (24) 

SCA\0 

With Var (f(X)) = 1 -/(0) 2 and by using the inequality 

H(f(X)) < (Var (f(X))) 1/ln(4) , given in Theorem 1.2 of 
[26], (24) becomes 

J2 f(S) 2 > MI(f (X); X A ) - * (Var (f (X))) , (25) 

SCA\0 

with *(•) as defined in (10). Finally, Theorem 3 follows by 
combining (22) and (25). 

Appendix 5 
Proof of Theorem 4 

For notational convenience, let A =[n] By definition 
of the conditional entropy, 

H(f(X)\X A ) = J2 P ^ = x A ]H(f(X)\X A = x A ) 

= J2 P[X A =x A ]h(P\f(X) = 1\X A =x A ]), 

x A ei-l,l>^l 

(26) 

where h(-) is the binary entropy function as defined in (6). 
Observe that 

h(P[f(X) = 1\X A = x A ]) = h(V[Xi = 1]) 

if 

f{X\ = x\, ...,Xi = 1, ...,X n = x n ) 
j^f(X\ = X\, ...,Xi = —l,...,X n = x n ) 

and 

/z(P[f(X) = l|X A =x^]) = 0 
otherwise. Hence, (26) becomes 
H(f(X)\X A )= J2 VlXA=x A ]h(p i n lf( x W( xee i) ), 

where x © e,- is the vector obtained from x by flipping its 
z'th entry, and Theorem 4 follows by using the definition of 
the influence. 

Appendix 6 
Proof of Theorem 5 

By definition, /(X) and X^ are statistically independent if 
and only if for all x A € {-1,+1} |A| 

P[/(X) = 1 |X A = x A ] = P[/(X) = 1] . (27) 



With 

P[/(X) = 1|X4 = x A ] = l - + ^E[f(X)\X A = x A ] 

and application of Lemma 1 given in Appendix 1, (27) 
becomes 

^/(S)<D s (x A ) =/(0) 

SQA 

O J2 /(5)$s(xa) = 0. (28) 

SCA\0 

It follows from the Fourier expansion (3) that (28) holds 
for all x A € {-1, +1} |A| if and only if/ (S) = 0 for all S c 
A \ 0, which proves the theorem. 
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